A Partition-Based Suffix Tree Construction and Its Applications

نویسندگان

  • Hongwei Huo
  • Vojislav Stojkovic
چکیده

A suffix tree (also called suffix trie, PAT tree or, position tree) is a powerful data structure that presents the suffixes of a given string in a way that allows a fast implementation of important string operations. The idea behind suffix trees is to assign to each symbol of a string an index corresponding to its position in the string. The first symbol in the string will have the index 1, the last symbol in the string will have the index n, where n = number of symbols in the string. These indexes instead of actual objects are used for the suffix tree construction. Suffix trees provide efficient access to all substrings of a string. They are used in string processing (such as string search, the longest repeated substring, the longest common substring, the longest palindrome, etc), text processing (such as editing, free-text search, etc), data compression, data clustering in search machines, etc. Suffix trees are important and popular data structures for processing long DNA sequences. Suffix trees are often used for efficient solving a variety computational biology and/or bioinformatics problems (such as searching for patterns in DNA or protein sequences, exact and approximate sequence matching, repeat finding, anchor finding in genome alignment, etc). A suffix tree displays the internal structure of a string in a deeper way. It can be constructed and represented in time and space proportional to the length of a sequence. A suffix tree requires affordable amount of memory. It can be fitted completely in the main memory of the present desktop computers. The linear construction time and space and the short search time are good features of suffix trees. They increase the importance of suffix trees. A suffix tree construction process is space demanding and may be a fatal in the case of a suffix tree to handle a huge number of long DNA sequences. Increasing the number of sequences to be handled, due to random access, causes degrades of the suffix tree construction process performance that uses suffix links. Thus, some approaches completely abandon the use of suffix link and give up the theoretically superior linear construction time for a quadratic time algorithm with better locality of reference.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Dynamic Approach to Weighted Suffix Tree Construction Algorithm

In present time weighted suffix tree is consider as a one of the most important existing data structure used for analyzing molecular weighted sequence. Although a static partitioning based parallel algorithm existed for the construction of weighted suffix tree, but for very long weighted DNA sequences it takes significant amount of time. However, in our implementation of dynamic partition based...

متن کامل

ERA: Efficient Serial and Parallel Suffix Tree Construction for Very Long Strings

The suffix tree is a data structure for indexing strings. It is used in a variety of applications such as bioinformatics, time series analysis, clustering, text editing and data compression. However, when the string and the resulting suffix tree are too large to fit into the main memory, most existing construction algorithms become very inefficient. This paper presents a disk-based suffix tree ...

متن کامل

A New Parallel Partition Algorithm for Parallel Suffix Tree Construction

The suffix tree is a compacted trie of all suffixes of a given string. It is a fundamental data structure in a wide range of domains such as text processing, data compression, computer vision, computational biology, and so on [1]. Moreover, it can be used for network researches such as web analysis, which has been studied actively [2], [3]. For example, suffix trees have been utilized to effect...

متن کامل

Suffix Trees and Suffix Arrays

Iowa State University 1.1 Basic Definitions and Properties . . . . . . . . . . . . . . . . . . . . 1-1 1.2 Linear Time Construction Algorithms . . . . . . . . . . . . . 1-4 Suffix Trees vs. Suffix Arrays • Linear Time Construction of Suffix Trees • Linear Time Construction of Suffix Arrays • Space Issues 1.3 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ...

متن کامل

Dynamic extended suffix arrays

The suffix tree data structure has been intensively described, studied and used in the eighties and nineties, its linear-time construction counterbalancing his spaceconsuming requirements. An equivalent data structure, the suffix array, has been described by Manber and Myers in 1990. This space-economical structure has been neglected during more than a decade, its construction being too slow. S...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012